On the k-Closest Substring and k-Consensus Pattern Problems
نویسندگان
چکیده
Given a set S = {s1, s2, . . . , sn} of strings each of length m, and an integer L, we study the following two problems. k-Closest Substring problem: find k center strings c1, c2, . . . , ck of length L minimizing d such that for each sj ∈ S, there is a length-L substring tj (closest substring) of sj with min1≤i≤k d(ci, tj) ≤ d. We give a PTAS for this problem, for k = O(1). k-Consensus Pattern problem: find k median strings c1, c2, . . . , ck of length L and a substring tj (consensus pattern) of length L from each sj minimizing the total cost w = n ∑ j=1 min 1≤i≤k d(ci, tj). We give a PTAS for this problem, for k = O(1). Our results improve recent results of [10] and [16] both of which depended on the random linear transformation technique in [16]. As for general k case, we give an alternative and direct proof of the NP-hardness of (2)-approximation of the Hamming radius k-clustering problem, a special case of the k-Closest Substring problem restricted to L = m.
منابع مشابه
Closest Substring Problems with Small Distances
We study two pattern matching problems that are motivated by applications in computational biology. In the Closest Substring problem k strings s1, . . ., sk are given, and the task is to find a string s of length L such that each string si has a consecutive substring of length L whose distance is at most d from s. We present two algorithms that aim to be efficient for small fixed values of d an...
متن کاملOn The Parameterized Intractability Of Motif Search Problems
We show that Closest Substring, one of the most important problems in the field of consensus string analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This is done by giving a “strongly structure-preserving” reduction from the graph problem Clique to Closest Substring. This problem is therefore unlikely to be solvable in tim...
متن کاملParameterized Intractability of Motif Search Problems
We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n) for any function f of k and constant c independent of k. The problem can therefore be expected to be i...
متن کاملar X iv : c s . C C / 0 20 50 56 v 1 2 1 M ay 2 00 2 Parameterized Intractability of Motif Search Problems ∗
We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n) for any function f of k and constant c independent of k. The problem can therefore be expected to be i...
متن کاملHard problems in similarity searching
The Closest Substring Problem is one of the most important problems in the field of computational biology. It is stated as follows: given a set of t sequences s1; s2; : : : st over an alphabet , and two integers k; d with d k, can one find a string s of length k and, for all i = 1; 2; : : : ; t, substrings oi of si, all of length k, such that d(s; oi) d (for all i = 1; 2; : : : ; t)? (here, d(:...
متن کامل